Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.985
Filter
1.
Nat Commun ; 15(1): 3942, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38729933

ABSTRACT

In clinical oncology, many diagnostic tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques necessitate the need for labels, providing manual cell annotations is time-consuming. In this paper, we propose a self-supervised framework (enVironment-aware cOntrastive cell represenTation learning: VOLTA) for cell representation learning in histopathology images using a technique that accounts for the cell's mutual relationship with its environment. We subject our model to extensive experiments on data collected from multiple institutions comprising over 800,000 cells and six cancer types. To showcase the potential of our proposed framework, we apply VOLTA to ovarian and endometrial cancers and demonstrate that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised models, we provide a framework that can empower discoveries without any annotation data, even in situations where sample sizes are limited.


Subject(s)
Endometrial Neoplasms , Ovarian Neoplasms , Humans , Female , Endometrial Neoplasms/pathology , Ovarian Neoplasms/pathology , Machine Learning , Supervised Machine Learning , Algorithms , Image Processing, Computer-Assisted/methods
2.
BMC Microbiol ; 24(1): 162, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38730339

ABSTRACT

BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.


Subject(s)
Bacteria , Bays , Microbiota , Phylogeny , RNA, Ribosomal, 16S , Seawater , Supervised Machine Learning , RNA, Ribosomal, 16S/genetics , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Microbiota/genetics , Seawater/microbiology , India , Bays/microbiology , Biodiversity , DNA, Bacterial/genetics , Salinity , Sequence Analysis, DNA/methods
3.
Cell ; 187(10): 2502-2520.e17, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38729110

ABSTRACT

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.


Subject(s)
Imaging, Three-Dimensional , Prostatic Neoplasms , Humans , Imaging, Three-Dimensional/methods , Prostatic Neoplasms/pathology , Prostatic Neoplasms/diagnostic imaging , Male , Prognosis , Deep Learning , X-Ray Microtomography/methods , Supervised Machine Learning
4.
PLoS One ; 19(5): e0299583, 2024.
Article in English | MEDLINE | ID: mdl-38696410

ABSTRACT

The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.


Subject(s)
Metabolic Networks and Pathways , Supervised Machine Learning , Humans , Datasets as Topic
5.
Sci Rep ; 14(1): 10820, 2024 05 11.
Article in English | MEDLINE | ID: mdl-38734825

ABSTRACT

Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods. The code can be accessed at https://github.com/pranavsinghps1/S4MI .


Subject(s)
Image Processing, Computer-Assisted , Supervised Machine Learning , Humans , Image Processing, Computer-Assisted/methods , Diagnostic Imaging/methods , Algorithms
6.
PeerJ ; 12: e17361, 2024.
Article in English | MEDLINE | ID: mdl-38737741

ABSTRACT

Phytoplankton are the world's largest oxygen producers found in oceans, seas and large water bodies, which play crucial roles in the marine food chain. Unbalanced biogeochemical features like salinity, pH, minerals, etc., can retard their growth. With advancements in better hardware, the usage of Artificial Intelligence techniques is rapidly increasing for creating an intelligent decision-making system. Therefore, we attempt to overcome this gap by using supervised regressions on reanalysis data targeting global phytoplankton levels in global waters. The presented experiment proposes the applications of different supervised machine learning regression techniques such as random forest, extra trees, bagging and histogram-based gradient boosting regressor on reanalysis data obtained from the Copernicus Global Ocean Biogeochemistry Hindcast dataset. Results obtained from the experiment have predicted the phytoplankton levels with a coefficient of determination score (R2) of up to 0.96. After further validation with larger datasets, the model can be deployed in a production environment in an attempt to complement in-situ measurement efforts.


Subject(s)
Machine Learning , Phytoplankton , Remote Sensing Technology , Remote Sensing Technology/methods , Remote Sensing Technology/instrumentation , Oceans and Seas , Environmental Monitoring/methods , Supervised Machine Learning
7.
Comput Biol Med ; 175: 108510, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38691913

ABSTRACT

BACKGROUND: The seizure prediction algorithms have demonstrated their potential in mitigating epilepsy risks by detecting the pre-ictal state using ongoing electroencephalogram (EEG) signals. However, most of them require high-density EEG, which is burdensome to the patients for daily monitoring. Moreover, prevailing seizure models require extensive training with significant labeled data which is very time-consuming and demanding for the epileptologists. METHOD: To address these challenges, here we propose an adaptive channel selection strategy and a semi-supervised deep learning model respectively to reduce the number of EEG channels and to limit the amount of labeled data required for accurate seizure prediction. Our channel selection module is centered on features from EEG power spectra parameterization that precisely characterize the epileptic activities to identify the seizure-associated channels for each patient. The semi-supervised model integrates generative adversarial networks and bidirectional long short-term memory networks to enhance seizure prediction. RESULTS: Our approach is evaluated on the CHB-MIT and Siena epilepsy datasets. With utilizing only 4 channels, the method demonstrates outstanding performance with an AUC of 93.15% on the CHB-MIT dataset and an AUC of 88.98% on the Siena dataset. Experimental results also demonstrate that our selection approach reduces the model parameters and training time. CONCLUSIONS: Adaptive channel selection coupled with semi-supervised learning can offer the possible bases for a light weight and computationally efficient seizure prediction system, making the daily monitoring practical to improve patients' quality of life.


Subject(s)
Electroencephalography , Seizures , Humans , Electroencephalography/methods , Seizures/physiopathology , Seizures/diagnosis , Signal Processing, Computer-Assisted , Deep Learning , Algorithms , Databases, Factual , Epilepsy/physiopathology , Supervised Machine Learning
8.
Comput Methods Programs Biomed ; 250: 108164, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38718709

ABSTRACT

BACKGROUND AND OBJECTIVE: Current automatic electrocardiogram (ECG) diagnostic systems could provide classification outcomes but often lack explanations for these results. This limitation hampers their application in clinical diagnoses. Previous supervised learning could not highlight abnormal segmentation output accurately enough for clinical application without manual labeling of large ECG datasets. METHOD: In this study, we present a multi-instance learning framework called MA-MIL, which has designed a multi-layer and multi-instance structure that is aggregated step by step at different scales. We evaluated our method using the public MIT-BIH dataset and our private dataset. RESULTS: The results show that our model performed well in both ECG classification output and heartbeat level, sub-heartbeat level abnormal segment detection, with accuracy and F1 scores of 0.987 and 0.986 for ECG classification and 0.968 and 0.949 for heartbeat level abnormal detection, respectively. Compared to visualization methods, the IoU values of MA-MIL improved by at least 17 % and at most 31 % across all categories. CONCLUSIONS: MA-MIL could accurately locate the abnormal ECG segment, offering more trustworthy results for clinical application.


Subject(s)
Algorithms , Electrocardiography , Supervised Machine Learning , Electrocardiography/methods , Humans , Heart Rate , Databases, Factual , Signal Processing, Computer-Assisted
9.
PLoS Comput Biol ; 20(4): e1012006, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38578796

ABSTRACT

Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.


Subject(s)
Gene Expression Profiling , Supervised Machine Learning , Uncertainty , Logistic Models
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38605640

ABSTRACT

Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.


Subject(s)
RNA Splicing , Vertebrates , Animals , Humans , Base Sequence , Vertebrates/genetics , RNA , Supervised Machine Learning
11.
Sensors (Basel) ; 24(7)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38610406

ABSTRACT

Wearable sensors could be beneficial for the continuous quantification of upper limb motor symptoms in people with Parkinson's disease (PD). This work evaluates the use of two inertial measurement units combined with supervised machine learning models to classify and predict a subset of MDS-UPDRS III subitems in PD. We attached the two compact wearable sensors on the dorsal part of each hand of 33 people with PD and 12 controls. Each participant performed six clinical movement tasks in parallel with an assessment of the MDS-UPDRS III. Random forest (RF) models were trained on the sensor data and motor scores. An overall accuracy of 94% was achieved in classifying the movement tasks. When employed for classifying the motor scores, the averaged area under the receiver operating characteristic values ranged from 68% to 92%. Motor scores were additionally predicted using an RF regression model. In a comparative analysis, trained support vector machine models outperformed the RF models for specific tasks. Furthermore, our results surpass the literature in certain cases. The methods developed in this work serve as a base for future studies, where home-based assessments of pharmacological effects on motor function could complement regular clinical assessments.


Subject(s)
Parkinson Disease , Humans , Parkinson Disease/diagnosis , Machine Learning , Movement , Supervised Machine Learning , Hand
12.
Int J Mol Sci ; 25(7)2024 Mar 28.
Article in English | MEDLINE | ID: mdl-38612602

ABSTRACT

Molecular property prediction is an important task in drug discovery, and with help of self-supervised learning methods, the performance of molecular property prediction could be improved by utilizing large-scale unlabeled dataset. In this paper, we propose a triple generative self-supervised learning method for molecular property prediction, called TGSS. Three encoders including a bi-directional long short-term memory recurrent neural network (BiLSTM), a Transformer, and a graph attention network (GAT) are used in pre-training the model using molecular sequence and graph structure data to extract molecular features. The variational auto encoder (VAE) is used for reconstructing features from the three models. In the downstream task, in order to balance the information between different molecular features, a feature fusion module is added to assign different weights to each feature. In addition, to improve the interpretability of the model, atomic similarity heat maps were introduced to demonstrate the effectiveness and rationality of molecular feature extraction. We demonstrate the accuracy of the proposed method on chemical and biological benchmark datasets by comparative experiments.


Subject(s)
Benchmarking , Drug Discovery , Animals , Electric Power Supplies , Estrus , Supervised Machine Learning
13.
BMC Bioinformatics ; 25(1): 155, 2024 Apr 20.
Article in English | MEDLINE | ID: mdl-38641616

ABSTRACT

BACKGROUND: Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies. One such approach is hybrid mechanistic/data-driven modeling, which integrates prior knowledge on input features into the learning process, enhancing the model's ability to extrapolate. RESULTS: We present NoiseCut, a Python package for noise-tolerant classification of binary data by employing a hybrid modeling approach that leverages solutions of defined max-cut problems. In a comparative analysis conducted on synthetically generated binary datasets, NoiseCut exhibits better overfitting prevention compared to the early stopping technique employed by different supervised machine learning algorithms. The noise tolerance of NoiseCut stems from a dropout strategy that leverages prior knowledge of input features and is further enhanced by the integration of max-cut problems into the learning process. CONCLUSIONS: NoiseCut is a Python package for the implementation of hybrid modeling for the classification of binary data. It facilitates the integration of mechanistic knowledge on the input features into learning from data in a structured manner and proves to be a valuable classification tool when the available training data is noisy and/or limited in size. This advantage is especially prominent in medical and biomedical applications where data scarcity and noise are common challenges. The codebase, illustrations, and documentation for NoiseCut are accessible for download at https://pypi.org/project/noisecut/ . The implementation detailed in this paper corresponds to the version 0.2.1 release of the software.


Subject(s)
Algorithms , Software , Humans , Supervised Machine Learning , Machine Learning
14.
Sci Rep ; 14(1): 9080, 2024 04 20.
Article in English | MEDLINE | ID: mdl-38643324

ABSTRACT

In developing countries, one-quarter of young women have suffered from anemia. However, the available studies in Ethiopia have been usually used the traditional stastical methods. Therefore, this study aimed to employ multiple machine learning algorithms to identify the most effective model for the prediction of anemia among youth girls in Ethiopia. A total of 5642 weighted samples of young girls from the 2016 Ethiopian Demographic and Health Survey dataset were utilized. The data underwent preprocessing, with 80% of the observations used for training the model and 20% for testing. Eight machine learning algorithms were employed to build and compare models. The model performance was assessed using evaluation metrics in Python software. Various data balancing techniques were applied, and the Boruta algorithm was used to select the most relevant features. Besides, association rule mining was conducted using the Apriori algorithm in R software. The random forest classifier with an AUC value of 82% outperformed in predicting anemia among all the tested classifiers. Region, poor wealth index, no formal education, unimproved toilet facility, rural residence, not used contraceptive method, religion, age, no media exposure, occupation, and having more than 5 family size were the top attributes to predict anemia. Association rule mining was identified the top seven best rules that most frequently associated with anemia. The random forest classifier is the best for predicting anemia. Therefore, making it potentially valuable as decision-support tools for the relevant stakeholders and giving emphasis for the identified predictors could be an important intervention to halt anemia among youth girls.


Subject(s)
Algorithms , Anemia , Humans , Adolescent , Female , Ethiopia/epidemiology , Supervised Machine Learning , Software , Anemia/diagnosis , Anemia/epidemiology
15.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38588559

ABSTRACT

MOTIVATION: Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS: Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION: The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.


Subject(s)
Deep Learning , Genomics , Phylogeny , Genomics/methods , Supervised Machine Learning , Humans
16.
Sci Adv ; 10(17): eadk4670, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38669334

ABSTRACT

The T cell receptor (TCR) repertoire is an extraordinarily diverse collection of TCRs essential for maintaining the body's homeostasis and response to threats. In this study, we compiled an extensive dataset of more than 4200 bulk TCR repertoire samples, encompassing 221,176,713 sequences, alongside 6,159,652 single-cell TCR sequences from over 400 samples. From this dataset, we then selected a representative subset of 5 million bulk sequences and 4.2 million single-cell sequences to train two specialized Transformer-based language models for bulk (CVC) and single-cell (scCVC) TCR repertoires, respectively. We show that these models successfully capture TCR core qualities, such as sharing, gene composition, and single-cell properties. These qualities are emergent in the encoded TCR latent space and enable classification into TCR-based qualities such as public sequences. These models demonstrate the potential of Transformer-based language models in TCR downstream applications.


Subject(s)
Receptors, Antigen, T-Cell , T-Lymphocytes , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/metabolism , Humans , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , Supervised Machine Learning , Single-Cell Analysis/methods , Computational Biology/methods
17.
Comput Biol Med ; 174: 108460, 2024 May.
Article in English | MEDLINE | ID: mdl-38636330

ABSTRACT

Classifying fine-grained lesions is challenging due to minor and subtle differences in medical images. This is because learning features of fine-grained lesions with highly minor differences is very difficult in training deep neural networks. Therefore, in this paper, we introduce Fine-Grained Self-Supervised Learning(FG-SSL) method for classifying subtle lesions in medical images. The proposed method progressively learns the model through hierarchical block such that the cross-correlation between the fine-grained Jigsaw puzzle and regularized original images is close to the identity matrix. We also apply hierarchical block for progressive fine-grained learning, which extracts different information in each step, to supervised learning for discovering subtle differences. Our method does not require an asymmetric model, nor does a negative sampling strategy, and is not sensitive to batch size. We evaluate the proposed fine-grained self-supervised learning method on comprehensive experiments using various medical image recognition datasets. In our experiments, the proposed method performs favorably compared to existing state-of-the-art approaches on the widely-used ISIC2018, APTOS2019, and ISIC2017 datasets.


Subject(s)
Supervised Machine Learning , Humans , Neural Networks, Computer , Image Interpretation, Computer-Assisted/methods , Algorithms , Image Processing, Computer-Assisted/methods
18.
J Neural Eng ; 21(2)2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38588700

ABSTRACT

Objective. The instability of the EEG acquisition devices may lead to information loss in the channels or frequency bands of the collected EEG. This phenomenon may be ignored in available models, which leads to the overfitting and low generalization of the model.Approach. Multiple self-supervised learning tasks are introduced in the proposed model to enhance the generalization of EEG emotion recognition and reduce the overfitting problem to some extent. Firstly, channel masking and frequency masking are introduced to simulate the information loss in certain channels and frequency bands resulting from the instability of EEG, and two self-supervised learning-based feature reconstruction tasks combining masked graph autoencoders (GAE) are constructed to enhance the generalization of the shared encoder. Secondly, to take full advantage of the complementary information contained in these two self-supervised learning tasks to ensure the reliability of feature reconstruction, a weight sharing (WS) mechanism is introduced between the two graph decoders. Thirdly, an adaptive weight multi-task loss (AWML) strategy based on homoscedastic uncertainty is adopted to combine the supervised learning loss and the two self-supervised learning losses to enhance the performance further.Main results. Experimental results on SEED, SEED-V, and DEAP datasets demonstrate that: (i) Generally, the proposed model achieves higher averaged emotion classification accuracy than various baselines included in both subject-dependent and subject-independent scenarios. (ii) Each key module contributes to the performance enhancement of the proposed model. (iii) It achieves higher training efficiency, and significantly lower model size and computational complexity than the state-of-the-art (SOTA) multi-task-based model. (iv) The performances of the proposed model are less influenced by the key parameters.Significance. The introduction of the self-supervised learning task helps to enhance the generalization of the EEG emotion recognition model and eliminate overfitting to some extent, which can be modified to be applied in other EEG-based classification tasks.


Subject(s)
Electroencephalography , Emotions , Supervised Machine Learning , Supervised Machine Learning/standards , Datasets as Topic , Humans
19.
Comput Methods Programs Biomed ; 249: 108141, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38574423

ABSTRACT

BACKGROUND AND OBJECTIVE: Lung tumor annotation is a key upstream task for further diagnosis and prognosis. Although deep learning techniques have promoted automation of lung tumor segmentation, there remain challenges impeding its application in clinical practice, such as a lack of prior annotation for model training and data-sharing among centers. METHODS: In this paper, we use data from six centers to design a novel federated semi-supervised learning (FSSL) framework with dynamic model aggregation and improve segmentation performance for lung tumors. To be specific, we propose a dynamically updated algorithm to deal with model parameter aggregation in FSSL, which takes advantage of both the quality and quantity of client data. Moreover, to increase the accessibility of data in the federated learning (FL) network, we explore the FAIR data principle while the previous federated methods never involve. RESULT: The experimental results show that the segmentation performance of our model in six centers is 0.9348, 0.8436, 0.8328, 0.7776, 0.8870 and 0.8460 respectively, which is superior to traditional deep learning methods and recent federated semi-supervised learning methods. CONCLUSION: The experimental results demonstrate that our method is superior to the existing FSSL methods. In addition, our proposed dynamic update strategy effectively utilizes the quality and quantity information of client data and shows efficiency in lung tumor segmentation. The source code is released on (https://github.com/GDPHMediaLab/FedDUS).


Subject(s)
Algorithms , Lung Neoplasms , Humans , Automation , Lung Neoplasms/diagnostic imaging , Software , Supervised Machine Learning , Tomography, X-Ray Computed , Image Processing, Computer-Assisted
20.
Artif Intell Med ; 151: 102828, 2024 May.
Article in English | MEDLINE | ID: mdl-38564879

ABSTRACT

Reliable large-scale cell detection and segmentation is the fundamental first step to understanding biological processes in the brain. The ability to phenotype cells at scale can accelerate preclinical drug evaluation and system-level brain histology studies. The impressive advances in deep learning offer a practical solution to cell image detection and segmentation. Unfortunately, categorizing cells and delineating their boundaries for training deep networks is an expensive process that requires skilled biologists. This paper presents a novel self-supervised Dual-Loss Adaptive Masked Autoencoder (DAMA) for learning rich features from multiplexed immunofluorescence brain images. DAMA's objective function minimizes the conditional entropy in pixel-level reconstruction and feature-level regression. Unlike existing self-supervised learning methods based on a random image masking strategy, DAMA employs a novel adaptive mask sampling strategy to maximize mutual information and effectively learn brain cell data. To the best of our knowledge, this is the first effort to develop a self-supervised learning method for multiplexed immunofluorescence brain images. Our extensive experiments demonstrate that DAMA features enable superior cell detection, segmentation, and classification performance without requiring many annotations. In addition, to examine the generalizability of DAMA, we also experimented on TissueNet, a multiplexed imaging dataset comprised of two-channel fluorescence images from six distinct tissue types, captured using six different imaging platforms. Our code is publicly available at https://github.com/hula-ai/DAMA.


Subject(s)
Brain , Brain/diagnostic imaging , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Humans , Deep Learning , Animals , Algorithms , Neuroimaging/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...